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83 AP ARIPO Patent: GH Ghana, GM Gambia, KE Kenya, LS Lesotho, MW Malawi, MZ Mozambique, SD Suda a 
SL Sierra Leone, SZ Swaziland, TZ United Republic of Tanzania, UG Uganda, ZM Zambia, ZW 2imbabwc, and any othfcr 
State which is a Contracting State of the Harare Protocol and of the PCT (if other kind of protection or treatment dcsir^t. 
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B EA Eurasian Patent: AM Armenia, AZ Azerbaijan, BY Belarus, KG Kyrgyzstan, KZ Kazakhstan, MD Republic ofMoldcAja, 
RU Russian Federation, TJ Tajikistan, TM Turkmenistan, and any other State which is a Contracting State of the Eurasi*- 
Patent Convention and of the PCT 

S3 EP European Patent: AT Austria, BE Belgium, BG Bulgaria, CH & Li Switzerland and Liechtenstein, CY Cyprus, CZ Cze :h 
Republic, DE Germany, I>K Denmark, EE Estonia, E$ Spain, Ft Finland, PR France, GB United Kingdom, GR Greece, 
• 1£ Ireland, XT Italy. LU Luxembourg, MC Monaco, NTL Netherlands, PT Portugal, SE Sweden, SI Slovenia, SK Slovak^ 
TR Turkey, and any other State which is a Contracting State of me European Patent Convention and of the PCT 

18 OA OAPI Patent BF Burkina Faso, BJ Benin, CF Central African Republic, CG Congo, CI C6te d'lvoirc. CM Camcrocn. 

G A Gabon, GN Guinea, GQ Equatorial Guinea, GW Guinea-Bissau, ML Mali, MR Mauritania, N E Niger, SN Sene g^ 
TD Chad, TG Togo, and any other State which is a member State of OAPI and a Contracting State of the PCT (if other ki^d 
of protection or treatment desired, specify an doited line) 

National Patent (tf other land of protection or treatment desired specify on dotted line): 

H AE United Arab Emirates B GM Gambia 83 NZ New Zealand 

B AG Antigua and Barbuda B BR Croatia H OMOman 

IS AL Albania OB HU Hungary H PH Philippines 

GB AM Armenia B £0 Indonesia B PL Poland 

SI AT Austria B IL Israel ?ort»s*\ 

B AU Australia B IN India B RO Romania 

H AZ Azerbaijan B IS Iceland H RU ****** Federation 

IB BA Bosnia and Herzegovina IS JP Japan 

B BB Barbados B KE Kenya B SC Seychelles 

H BG Bulgaria CH KG Kyrgyzstan » SD Sudan 

EH BR Brazil B KP Democratic People's Republic B SE Sweden 

H BY Belarus of Korea g SG Singapore 

0 BZ Belize . Of ICR Republic of Korea B SK Slovakia 

B CA Canada B KZ Kazakhstan B SL SierraLcone 

H CH & LI Switzerland and Liechtenstein B LC Saint Lucia B TJ Tajikistan 

B CN China B LK Sri Lanka B TM Turkmenistan 

IB CO Colombia B LR Liberia B TN Tunisia 

B CR CostaRica H LS Lesotho H TR Turkey 

IB CU Cuba B LT Lithuania B TT Trinidad and Tobago 

IB CZ Czech Republic B LU Luxembourg ' 

B DE Germany B LV Latvia « TZ United Republic of Tanzania 

03 DK Denmark B MAMorocco H UA Ukraine 

IB DMDorninica B MD Republic of Moldova g UG Uganda 

5B BZ Algeria 18 US UnitBd SxatC5 °f America . . 

B EC Ecuador B MC Madagascar 

B EE Estonia B MKThe former Yugoslav Republic of IB UZ Uzbekistan 

IB ES Spain Macedonia B VC Saint Vincent and the Grenad 

IB FI Finland B MN Mongolia H VN Viet Nam 

B GB United Kingdom B MWMalawi B YU Yugoslavia .... 

B GD Grenada B MXMexico B ZA South Africa . . . 

IB GE Georgia B MZ Mozambique B ZM Zambia 

SQ GH Ghana H NO Norway B ZW Zimbabwe 

Check-boxes below reserved for designating States which have become parry to the PCT after issuance of this sheet: 

jg) Nl Nicaraaiia □ D 

□ □ 
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other designations which would be permitted under the PCT except any designations) indicated in the Supplemental Box as being 
excluded torn the scope of this statement. The applicant declares that those additional designations are subject to eonfirmanoa an J that 
any designation which is not confirmed before the expiration of 1 5 months from the priority date is to be regarded as withdrawn I y rhe 
applicant at the expiration of that time limit. (Confirmation (includingfees) must reach the receiving Office within the 1 5-month timeiunit.) 
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Bo* No. VI PRIORITY CLAIM 



The priority of the following earlier appticatioa(s) is hereby claimed: 



Filing dare 
of earlier application 
(day/month/year) 



item (1) 



item (2) 



item (3) 



item (4) 



item (5) 



Number 
of earlier application 



Where earlier application is: 



national application: 
country or Member 
o f WTO 



regional application!" 
regional Office 



international application: 
receiving Office 



Q further priority claims are indicated in me Supplemental Box. 



Thz receiving Office is requested to prepare and transmit to the International Bureau a certified copy of the earlier app^tion(s) (o *fy 
tf^^c^lication^as filed v»lhZ Office which for, he purposes Of this international application Is tlx reccing Office) .denuded 

□ aUtans □ item(J) □ irem(2) □ item(3) □ itcm(4) □ item (5) □ 

Supplemental Sox 

* Where the earlier application is an AFJPO application, ^ate at least one cou^ party to tlie 

Industrial Property oYone Member of the World Trade Organicationfor which that earlier application was filed (Rule 4.10(b)(n))- - - - 



Box No. VTT INTERNATIONAL SEARCHING AUTHORITY 



Choice of International Searching Authority (ISA) Of two or more Imernational Searching Authorities are competent to carry out 
international search, indicate the Authority chosen; the two-letter code may be used/. 



ISA/ 



Request to use results of earlier search; reference to that search (if an earlier search has been carried out by or requested from 
International Searching Authority): 

Date (day/montWyear) Number Country (or regional Office) 



Box No. Vm DECLARATIONS 



The following declarations arc contained in Boxes Nos. VIII (i) to (v) (mark the applicable 
check-boxes below and indicate in the right column the number of each type of declaration): 



Number of 
declarations 



the 



the 



□ BoxNo.VHI(i) 

□ Box No. VIII (ii) 

□ Box No. Vin(iit) 
Q BoxNa VHI(iv) 

□ Box No. Vm (v) 



Declaration as to the identity of the inventor 

Declaration as to the applicant's entitlement, as at the iniematioDal filing 
date, to apply for and be granted a patent 

Declaration as to the applicant's entitlement, as at the international filing 
date* to claim the priority of the earlier application 

Declaration of inventorship (only for the purposes of the designation of the 
United States of America) 

Declaration as to non-prejudicial disclosures or exceptions to lack of novelty 
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Box No. IX CHECK LIST; LANGUAGE OF FILING 



26 



This international application contains: 

(a) in paper form, the following number of 
sheets : 

request (including 

declaration sheets) : 5 

description (excluding 

sequence listings and/or 

tables related thereto) : 1 7 

claims : 3 

abstract : 1 

drawings : 2 

Sub-total number of sheets 
sequence listings 
tables related thereto 

(for both, actual number of 

sheets if filed in paper j form, 

whether or not also filed in 

computer readable form; 

see (c) below) ■ . 

Total number of sheets 28 

(b) □ only in computer readable form 

(Section 801(aXi)) 
(0 □ sequence listings 
C») Q tables related thereto 

(c) □ also In computer readable form 

(Section SOlfaXii)) 

(I) □ sequence listings 

(ii) □ tables related thereto 

Type and number of carriers (diskette, 
CD-ROM, CD-R or other) on which are 
contained the 

□ sequence listings: 

□ tables related thereto; . . , , 

(additional copies so be indicated under 
items 9{li) and/or J 0(H), in right column) 



Figure of the drawings which 
should accompany the abstract: 



This international application is accompanied by the following 
itcm(s) (mark the applicable checkboxes below and indicate in 
Hghi column the number of each item): 

1.8) fee calculation sheet 

2. □ original separate power of attorney 

3 ' □ original general power of anorncy 



Nut iber 
of r ems 



4.D 

5-D 
6.D 

7-D 

*.□ 
9.n 

(0 

(»*) 



copy of general power of attorney; reference number, 
if any: 

statement explaining lack of signature 

priorin 
ltcmfs; 



documents) identified in Box No. VI as 



translation of international application into 
(language)'. 



separate indications concerning deposited microorganism 
or other biological material 

sequence listings in computer readable form 
(indicate type and number of carriers) 

D S2 



y submitted for the purposes of international search under 
le liter only (and not as part of the international application) : 

□ (only where check-box (b)(1) or (c)(1) Is marked in left column) 
additional copies including, where applicable, the copy for the 
purposes of international search under Rule liter 

(iii) □ together with relevant statement as to the identity of the copy or 
copies with the sequence listings mentioned in left column : 

1 0. □ tabl es in computer readable form related to sequence listings 

(indicate type and number of carriers) 

(0 □ copy submitted for the purposes of international search under 
Section SQ2{b-quater) only (and not as part of the international 
application) 

(ii) CJ (only where check-box (b)(U) or (c)(ii) is marked in left column) 

additional copies including, where applicable, the copy for the 
purposes of international search under Section S02(b-quater) 

(iii) □ together with relevant sratement as to the identity of the copy or 

copies with the tables mentioned in left column : 

11. □ othev (spedj^); : 



Language of filing of the 
tional application: 



English 
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iorney) 
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1. Date of actual receipt of me purported 
international application: 
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3 , Corrected date of actual receipt due to later but 
timely received papers or drawings completing 
the purported international application: 



[it ft 03) 



4. Date of timely receipt of the required 
corrections under PCT Article 1 1(2); 



5. International Searching Authority _ ^ 

(if two or more are competent): ISA / O 



rTT^Transmittal of search copy delayed 
LS1I until se 



I search fee is paid 



2. Drawings: 
I I received: 



I | not received: 



For International Bureau use only , 



Date of receipt of the record copy 
by the International Burcau: 
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Method and Device for providing speech-enabled input 
in an electronic device having a user interface 



£nd 



to a 
daa- 



The present invention relates to multimodal interactive browsing on electronic devices £ 
portable terminals and in communication networks. More specifically, the invention relates 
simple multi-modal user interface concept, by offering a close guidance of possible voice 
input and voice browsing as an entry alternative to use manual input. Moreover, the invention 
related to checking the preliminary conditions that should be fulfilled for valid voice input. 

keypad. 



is 



speech 



In multimodal applications, users can interact with other input modalities than only the 
For example, commands that are traditionally given by scrolling and clicking can be s; 
enabled in the application so that the user can speak the commands, which will then 
recognized by an automatic speech recognition engine. Adding speech interaction to 
applications receives growing interest as me enabling technologies are maturing, since in 
mobile scenarios using the keypad is difficult, for example when driving or walking. 



be 



vi'ual 
many 



Until now different multimodal browsing architectures have already been proposed. For example 
the document US 6101473 describes a method, where voice browsing is realized by synchror ous 
operation of a telephone network service and an internet service. This is definitively prohibitive 
due to the waste of network resources, requiring two different communication links. Further 
service requires an interconnection between the telephone service and the internet service. 
Another hurdle for user satisfaction is that the over-the-air co-browser synchronization required 
in a distributed browser architecture may cause latencies in browser operation which will deg rade 
the user experience. 

The document US 6188985 describes a method in which a wireless control unit implement;! the 
voice browsing capabilities to a host computer. For mis purpose, a number of multimodal 
browser architectures have been proposed where these operations are placed on a network sei ver. 

recognition 



The patent US 6374226 describes a system that is capable of changing the speech 
grammar dynamically. For example, when an E-mail program goes to the composition 
new grammar set up is dynamically activated. This includes on one hand an improved 
35 device resources, but also includes the severe disadvantage that the device changes its 



moie. 
use 



, a 
of 



passive 
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vocabulary". This may lead to frustrating experiences as the user who has learned that the device 
understands a certain expression may be faced with a device feigning deafness for its input wjen 
running another application. 

The known systems suffer from the fact that users are not very keen to take the speech-enabled 
features into use. Another problem arising from the state of the art is that users may not always 
be aware of the operation status of speech enabled browsing systems. 



While there are standards being developed for how to write multimodal applications, there are 
10 ■ standards as to how the application interface should be built so that it would be as eas> 
possible for the user to become aware that speech input can be used. 



Especially in devices and applications it would be desirable for a user to know which 
voice input is allowed at different times or under certain conditions. 



partic Jar 



But when a user has put a speech recognition system successfully into use, it is probable thai 
user also continues to use it In other words, there is a hurdle in starting to use speech control. 



no 

as 



the 



The problem has been solved earlier by audio prompts etc., but these become annoying ^ery 
quickly, which degrades the usability experience. 



Moreover, due to system load or the behavior of applications, all speech control options ma) 
be available at all times, which is very difficult to convey to the user using prior art techniques 



All the above approaches for a multimodal browsing architecture have in common that they 
not suitable for use in mobile electronic devices of terminals such as mobile phones, or * 
computers, due to low computing power, restricted resources or low battery capacity. 



not 



are 



hancheld 



So it would be desirable to have a multimodal browsing system that is speech-enabled 
30 provides superior user-friendliness. 



multimodal 



According to a first aspect of the present invention, there is provided a method for 
interactive browsing, comprising the steps of activating a multimodal user 
comprising at least one key input option and at least one voice input option, displaying 
least one key input option, checking if there is at least one condition affecting said voice 
option, and providing voice input options and displaying indications of said provided voice 
options accor ding to said condition. 



interaction, 



trie at 



and 



input 
input 
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3 



35 



on 



The activation of a multimodal user interaction in which at least one key input option ind 
conditionally at least one voice input option is provided, can be provided by at least switching 
the device or by activating a respective menu or respective settings. 

5 

In said multimodal browsing key input options are unconditionally provided and said at least one 
voice input option is conditionally provided. Said at least one voice input option is not provided, 
if at least one condition that could possibly interfere with voice input is fulfilled. The condition 
can be, e.g. ambient noise or a too low signal to noise ratio at the audio input The condition can 
10 be, e.g. too low processing power, or battery status. The condition can e.g. be a too low voice 
transfer capability in case of a distributed speech input / recognition system. The condition can be 
restricted device resources. It should be noted mat the conditions affecting the voice recogni ion 
feature can be caused by a combination of the above conditions. 

15 The at least one key input option is displayed on a display of said electronic device or mobile 
terminal device, as in the case of conventional devices and conventional browsing. 

The method is characterized by checking if at least one condition affecting the voice input is 
fulfilled and providing said at least one voice input option and displaying indications of said 
20 voice input options on said display, in case that none of said conditions is fulfilled.The cheddng 
can be performed e.g. every second, in faster intervals or continually. The checking can alsi be 
performed in an event controlled manner, wherein the check is only performed if an evejit is 
detected that is indicative of an impossible voice input. 



and 
also 



25 If no such condition is fulfilled, the method caD provide at least one voice input option 
displaying indications of said at least one available voice input option on said display. It is 
possible to display, if no such condition is fulfilled the depiction or representation or indicitic-n 
that a voice input option is present and that a voice input can actually be performed. The firsi 
describes the principle that a voice input can be made or is in the passive vocabulary of a 

30 recognition engine and the second part describes that voice recognition engine is active. 



It is also possible to display a representation of the checked condition that is actually fulfilled and 
interferes with the voice irjput option. This can be embodied as e.g. a kind of icon oi 
indicating what kind of condition prevents the voice input and how it may be removed. 



part 
Voice 



text 



hi multimodal applications where voice input can be given in addition to visual input ( 
keypad), the user must be made aware of when voice inputs are possible and also what 



s the 
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allowed input. This method suggests a transparent way of letting the user know exactly when 
voice recognition is active and which voice commands are speech-enabled at any point. 

Event mechanisms can also be used by the system to determine situations when spejjch 
recognition is not available for unexpected reasons or when the application designer 
specified that a certain command or a command set is speech-enabled. All commands that 
speech-enabled at a certain moment will be marked with a suitable visual method, for e: 
coloring, to indicate to the user both the speaking moment and the allowed utterance. 



10 The invention proposes to indicate dynamically by visual keywords or visual cues the elemjsits 
that can be voice controlled depending on the availability of voice control for each item, 
cample, if the speech recognition engine cannot be used temporarily, or if only certain options 
available at a certain point in an application, only those options are highlighted on the screen. 



las 
are 



ixample 



ex* 
are ; 



can 



It can also be marked when speech input is temporally unavailable. It is also possible to 
only entries that are not speech enabled. This is some kind of an inverse approach that - 
extended to some kind of switching between marking speech enabled straight and marking 
speech enabled input options in dependence of the number of markings necessary. This - 
implemented straight forward: green: enabled and black: not enabled and in an inverse - 
red: not speech enabled input options and black: speech enabled input options. 



mark 



can 



notation 



be 
not 
be 



and 
the 
mat 
the 
icon 
hen. 



This invention suggests visual keywords or cues to indicate to the user what can be spoken 
also when the speech-enabling is on or off. When a visual command is speech-enabled, 
command itself is marked e.g. wim a different color or a respective icon than the commands 
are not speech-enabled. When the speech-enabling is off the color or a respective icon o 
command changes dynamically back, and if speech-enabling is turned on again, Hie color or 
will change again. This marking will immediately indicate to the user what can be said and v. 
The method can be combined with an input prediction method to sort frequently used jnput 
option to the top of the list. 

The reasons why the speech-enabling of a command might change while the user stays oji the 
same screen can be, for example, the following: 

- System error: connection to the speech recognizer is cut off unexpectedly, 

- Change of environment: the device detects too much background noise for the recogjrition 
to work properly, 
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- System is currently doing some action during which it cannot listen at the same ti ne 
because of system or application limitations, exhausted or exhaustively used system 
resources e.g. fetching data for the user, and 

- Application designer's choice, described more closely in the following paragraph 

Different applications may choose different recognition grammars and vocabulary to enable 
speech in different manners, and the usage can vary even within one application. For example , if 
on one screen the user can do several different actions (each including 2-3 choices of a menu), 
the order of which does not matter, it is reasonable to allow the user to speak any of the optio ns. 
On the next screen, there may again be several actions, but this time the order is not totally fi ee. 
It is best to guide the user's speech input by making the order of actions explicit with the vis ual 
speech-enabling cue that is chosen, highlighting the actions at their proper time. 

However, in a totally eyes-free situation, where voice is the only available modality, his 
invention cannot be used as the only cue to the user. Some auditory keywords would be requ: red 
to indicate to the user when (and/or what) the user can speak. One way to indicate that a speech 
recognition is actually available can be implemented by a vibration alarm prompt. The vibraion 
alarm prompt can comprise a single vibration as a start signal, and a short double vibration 
stop signal. 



In an example embodiment said displayed indications of voice input options comprise keywords 
The keywords can vis ualiz e available voice input or control options. The keywords can comprise 
any kind of cues or hints to the actual speech input that may be not displayable (such 
whistling, h ummin g or such sounds). 



as a 



as 



In another example embodiment said displaying of indications of said voice input options on said 
display further comprises displaying, if a speech recognition is actually possible. As already 
described above that is the recording or recognition state of a speech or voice recognition enf ine. 
This can be described as 'recording' or 'recognizing' sign. 

In another example embodiment of the present invention said displaying of indications of voice 
input options comprises displaying said voice input options itself. That is, the input options are 
depicted as the verbatim of the words to be spoken for the voice input. The wording "input 
option" has been carefully chosen not to restrict the indication or the input option to any kir d of 
specific form. 



BEST AVAILABLE COPY 

07/04 '03 LUN 16:09 [N° TX/RX 8833] 



07-APR-03 16:12 



Von-Patentarwaelte Becker Kurig Straus 



+49 89 746 303 1 1 



T-813 S. 01 2/029 F-568 

PCT/IB03/01262 



6 



20 



In another example embodiment of the present invention said displaying of indications of sdd 
voice input options on said display, is provided with a hysteresis. The use of a hysteretioal 
behavior helps to avoid fast changes on the indication of the availability of said voice input 
5 options, in case That one of said checked conditions is near a threshold between inferring and not 
inferring said voice input feature. The hysteresis can be implemented in the checking or the 
program performing the check, or in the application performing the indication. 

In another example embodiment of the present invention said displaying of indications of said 
10 voice input options on said display, is provided with a backlog function. As in the case of the 
hysteresis the backlog function can be used to determine and eliminate fast changing conditions 
that may cross a threshold value related to a condition (e.g. even overriding the byteresis) to 
prevent the user from being confused by a rapidly changing voice input ability or voice input 
options. A backlog functionality can he implemented by a storage for storing the checking res xlts 
15 of the last V seconds and a deactivation of a voice input option, as long as a single "ever 
threshold value" entry is present in said back log file. As in the case of the hysteresis, the bacl Jog 
function can be implemented in the display application or in the checking application. In both 
cases, the information conveyed to the user is made independent from small changes in 
vicinity of a threshold and from fast changes. 



the 



According to yet another aspect of me invention, a software tool is provided comprising program 
code means for carrying out me melhod of the preceding description when said program product 
is run on a computer, a network device or a mobile terminal device. 

25 According to another aspect of the present invention, a computer program product downloadable 
from a server for carrying out the melhod of the preceding description is provided, which 
comprises program code means for performing all of the steps of the preceding methods jhen 
said program is run on a computer, a network device or a mobile terminal device. 

30 According to yet another aspect of the invention, a computer program product is provided 
comprising program code means stored on a computer readable medium for carrying out the 
methods of the preceding description, when said program product is run on a computjer, a 
network device or a mobile terminal device. 



35 According to another aspect of the present invention a computer data signal is provided 
computer data signal is embodied in a carrier wave and represents a program that makes 
computer perform the steps of the method contained in the preceding description, wher 



The 
the 
said 
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computer program is run on a computer, a network device or a mobile terminal device. 

The computer program and the computer program product may be distributed in different psits 
and devices of the network. The computer program and the computer product device run 
different devices e.g. terminal device and remote speech recognition engine of the net 
Therefore, the computer program and the computer program device have to be different 
abilities and source code. 



According to yet another aspect of the present invention a mobile terminal device for 
simulated co mmuni cation is provided. The terminal device comprises a central processing 
display, a key based input system, a microphone and a data access means. 



m 



in 



executing 



unit, 



The central processing unit CPU, is provided to execute and run applications on said 
terminal. The display is connected to said CPU, to display visual content received from 
CPU. The key based input system is connected to said CPU, to provide a key input feature 
can provide key input options displayed on said display. The microphone is connected to 
CPU, to provide a conditional voice input feature. The data access means is connected to 
CPU, to handle data and to exchange data required for the operation of the CPU. hi the 
case the data access means is a storage and in more sophisticated embodiments the data 
means can comprise e.g. a modem for a network access. 



mo rile 
<;aid 
liat 
said 
i;aid 



The CPU is configured to perform multimodal browsing via said display, said key based i^iput 
system and said microphone. The CPU is configured to continually monitor conditions 
interfere with said voice input and to provide said voice input feature, and display an 
of a voice input option of said voice input feature on said display, in case no such condition 
fulfilled. 



that 



indies tion 



is 



X Jt7 

a 



simplest 



access 



According to yet another aspect of the present invention a speech recognition system is provided 
that is capable of multimodal user interaction. The speech recognition system comprises at least 

30 one central processing unit, a display, a key-based input system, a microphone, and a data bus. 

Said display is connected to said central processing unit to be controlled by said central 
processing unit (CPU). Said key-based input system is operably connected to said central 
processing unit, to provide a key input feature providing key input options that can be displayed 
on said display. The microphone is operably connected to said at least one CPU to prov .< 

35 audio-electronic converter to make voice input accessible to said CPU. The data bus is ope rably 
connected to said at least one CPU, to handle data and to exchange data required fot the 
operation of the said at least one CPU. 
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30 



Said at least one CPU comprises a first central processing unit and a second processing unit. Sirid 
first processing unit of said at least one CPU is configured to control multimodal interaction via 
said display, said key based input system and said microphone. Said first processing unit 
further configured to monitor conditions that affect said voice input and to control and display 



is 
an 



indication of a voice input option of said voice input feature on said display according to sjdd 
condition. Said second central processing unit of said at least one CPU is configured to prov de 



said voice input feature. 



[he 



1 0 In another example embodiment of the present invention the first central processing unit and 
second central processing unit of the at least one CPU are comprised in the same device. 

In yet another example embodiment of the system the first central processing unit and the second 
central processing unit of the at least one CPU are comprised in different interconnected devices 
1 5 The interconnection can be provided by an audio telephone connection. The interconnection isx 
be provided by a data connection such as GPRS (General Packet Radio Service), Internet, LAN 
(Local area network) and the like. 

Li another example embodiment said mobile electronic device further comprises a monle 
20 telephone. 



In the following, the invention will be described in detail by referring to the enclosed drawingfe 
which: 



iser 



25 Figure 1 is a flowchart of a method for dynamically indicating speech-enabling status to the 
in multimodal mobile applications according to one aspect of the present invention, 

Figure 2 is an example of an electronic device being capable of dynamically indicating speech- 
enabling status to the user for multimodal browsing, and 



Figure 3 is an example of a display comprising different indications of visual input options 
their actual possible input state, and 



Figures 4A and 4B are examples of a distributed speech recognition system being capab 
35 dynamically indicating speech-enabling status to the user for multimodal browsing. 
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9 



Figure 1 is a flowchart of a method for dynamically indicating speech-enabling status to the user 
in multimodal mobile applications according to one aspect of the present invention. The method 
starts with the activation of a multimodal browsing 4. The expression 'multimodal browsing' 
used to describe the possibility to interact with the device in different modes, i.e. the device c an 
put out different modes e.g. a visual mode or an audible mode. Multimodal browsing can also 
include different input modes such as cursor or menu-keys and or alphanumerical keyboards, 
voice recognition or eye tracking. In the present figures a system with key and voice in?ut 
capabilities is exemplary chosen to visualize the nature of the present invention. Following or 
simultaneously with the activation of the multimodal browsing, a monitoring or a surveying of 
the available input capabilities is started- The surveillance can be embodied by directly :ind 
repeatedly surveying the conditions that influence the speech recognition The surveillance can 
also be embodied by a kind of indirect survey, by implementing sub algorithms at the respective 
application operating with a parameter that influences the speech recognition, and posts a signal 
or a message to the voice input application that a voice input is (probably) not possible. Sucl an 
approach can be described as an event based approach. 



A possible condition is for example the actually available processing power. In case 
distributed voice input system a condition can be the connection properties such as 
signal to noise ratio or the like. Another condition comprises the ambient or background 
20 which influences the speech recognition abilities. 



cf a 



bandwidth 



noise 



From these example conditions it can be derived how probably a voice or speech input would be 
recognized. Therefore it can be derived, if the voice input feature is actually available or net. It 
should be noted that the ability to recognize certain voice inputs may vary from the condition. 
For example a background noise that comprises a sound signal mat can be detected every second 
need not necessarily disturb the input of very short voice inputs, wherein voice inputs longer 
a second can not be recognized because of the noise event. 



ihan 



iiput 
device or 
mobile 



In a next step a visual content is depicted 12 according to said monitored and evaluated i 
capabilities. That means mat input options are depicted on a display of said electronic 
said mobile terminal device. Due to the usually restricted information content of a small 
display, it should be clear that usually not all possible input options can be depicted or 
display simultaneously. It should be noted that the unavailability of a voice input can 
depicted. 



also 



The user can simply perceive the available and possible speech inputs and can browse 
elements depicted on the display by using either speech input or key input 16. When perfo 



the 
be 



the 



niung 
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10 



multimodal browsing a new display content can be called and depicted, wherein the new contimt 
is also provided with speech input keywords or cues and the like which are dynamically 
generated by surveying and evaluating the multimodal browsing conditions (i.e. speech inpi t / 
eye tracking / recognition conditions). 



The method ends with the deactivation of the multimodal browsing 18. With the end of he 
multimodal browsing, the surveillance of the multimodal input conditions can also be stopped or 
interrupted. A direct connection between the boxes 8 or 12 to 18 has been economized, as the 
termination of the multimodal browsing is performed by a user input In case of an automatic 
shutdown (e.g. a low battery power shutdown), the device can directly jump from 8 or 12 to U . 

As usability tests have indicated, the learning curve of users in using speech is steep in that m ers 
adopt the speech interaction rather quickly and fluently after the first successful attempts. 
However, there is a high threshold to overcome before the learning can start, hi other words, 
users do not usually realize that speech input is available unless explicitly told so. Moreove: ■, it 
takes time and courage for them to try the speech command if they are not sure about what tiey 
can say. After trial and success many start to even favor the speech input modality when making 
routine selections. After trial and error it may happen that users simply ignore any speech ir put 
ability. 

The tasks where speech can be used in visual applications can be divided into two categories: 

1) speech-enabling existing visual commands (selecting links, radio buttons, etc.) 

2) allowing actions for which there is no visual equivalent (e.g. shortcuts = 
utterances combining several commands allowing the user to bypass hierarclical 
selections, or allowing the user to enter text, as in dictation) 

This invention focuses mainly on category 1, indicating to the user what is speech-enabled and 
when at different points in the application In category 2 type tasks, this invention allows 
indication to the user when speech input is possible, by selecting the implementation suitably 
what exactly can be said in these tasks is out of the scope of this invention, except in case 
combination with a speech input prediction system wherein the borders between the 
categories become blutred. 



but 
of a 

jotih 



To lower the threshold to use voice input and multimodal browsing a small demo 
embodied in the electronic device or terminal can be embodied as some kind of a language 
35 wherein the phone demonstrates in a replayed dialogue a typical input scenario with pre-: 



version. 



lab, 
•reccrded 



BEST AVAILABLE COPY 



07/04 '03 LUN 16:09 [N° TX/RI 8833] 



07-APR-03 16:13 



Von-Patentanwaelte Becker Kurig Straus 



+49 68 746 303 1 1 



10 



15 



20 



30 



35 



T-813 S. 017/028 F-568 

PCT/IB03/01262 



11 



speech inputs and input actions. For example :'To select the actual Battery status say ' 

Fuelstate' repeat : and the requested information is read out loud 'Battery power 

25%'", 'To select the actual Battery status say 'show Fuelstate' and the requested information 
depicted on the display", wherein both actions can be accompanied the respective output. 



sjay 
at 
is 



In combination with a basic cursor based voice navigation system and speech recognizable wo rds 
like 'right', 'left', 'up', 'down*, 'click, 'doubleclick', 'clickclick', 'hold', 'delete' and 'selecl' a 
voice access can be provided even to voice-unable menu structures. The indication of a voice 
enabled speech navigation system can be provided by a mouth icon surrounded by the respective 
action icons or a mouth shaped cursor. In case of the selection of a gaming application * 
browsing via a menu (say „upupupupupclick" or „game") the possible speech input features 
highlighted by a teeth / mouth icon or a snake icon to, select the game „snake'' 
„downdownclick" or „snake"). 



by 
are 

(say 



Figure 2 is an example of a electronic device or a tenninal being capable of dynamically 
indicating speech-enabling status to the user for multimodal browsing. The device is depic ted 
with a user interface as it is known from a mobile telephone. The mobile device is capable 
executing multimodal interactive browsing, and comprises a user interface with input and ouiput 
means such as the display 82 the keys 84 and 84', a microphone 86 and a loudspeaker 88. The 
user interface can be used for multimodal browsing comprising audio and key input and audio 
and display output All the elements of the user interface are reconnected to a central processing 
unit CPU 80 to control the interaction of the user and the device. 

The central processing unit is also connected to a data access means 90, to handle data ani to 
25 exchange data required for the operation of the CPU 80 or applications running on said CPU 80. 
The CPU 80 is configured to perform multimodal browsing via said display (82), said key b ised 
input system 84, 84' and said microphone 86, and may be over said loudspeaker 88. The 
availability or operability of the multimodal browsing is dependent of parameters oi on 
determined conditions. The CPU 80 can provide a multimodal browsing capability e.g|. by 
running voice recognition applications on the device. 



The CPU 80 is further connected to a data access means to access data stored in a built in stcjrage 
(not shown) or access data via e.g. a network connection 92, to provide said 
browsing feature. 



multirr odal 



Said CPU 80 is further configured to monitor said conditions to continually deterrnins: 



the 
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in 
that 



on 



availability of said voice input feature. The monitoring can be applied e.g. every second 
shorter intervals or continuously, in dependence of the kind of parameters or the conditions 
are monitored or surveyed. 

5 The determined availability of the voice input feature is then visually indicated on a display 
basis of said determined availability. 

In the case that the multimodal browsing is constant independent from any external or internal 
restrictions the present invention can not be applied in a meaningful way, as if there are no 
10 changing parameters effecting the multimodal browsing, it is useless to monitor fljese 
parameters, as changes in vocabulary or the voice input capability can not occur. 

Figure 3 is an example of a display comprising different indications of visual input options and 
iheir actual possible input state. There is depicted a display 58 of a mobile device that is 
multimodal browsing enabled. On the right side of the display 58 a light emitting diode LEE 1 60 
is placed. The LED can be used to indicate that a voice recognition engine or module is actually 
active or in a reception mode. The glowing flashing or blinking LED 60 can indicate that the jiser 
can talk to perform a user input or a user selection. 

On the display there is depicted a usual list of selectable menu points "Menu option 1-4" 62. 
Related to each of the menu options 62 there is depicted an icon 64, 68 indicating the possible 
input modes. The "Menu options 1, 2 and 4" are provided with a mouth icon to indicate that 
these input options are "voice inputable". The "Menu option 3" is provided with a finger iccjn to 
indicate that the only available input option for this menu option is pressing a key. 



an 



The "Menu option 2" is underlined to indicate that a cursor is actually selectable by pressing 
'OK'-Button or by a voice input such as 'OK', 'click', 'doubleclick', 'clickclick' or 'select' 1 



seleciable 



The "Menu option 2" is depicted in bold letters to indicate that the "Menu option 2" is 
by voice inputting the words " Menu option 2". The word 'option' of the "Menu option 
depicted in bold letters to indicate mat the "Menu option 1" is selectable by voice inputtin, 
words 'option'. The syllable 'men' and the number '4' of the "Menu option 4" are ' 
bold characters to indicate that the "Menu option 4" is selectable by voice inputting the 
'Men four', or a wording based on this abbreviation. 



depict* :d 



words 



" is 
the 
in 
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The icons 66, 70 on the bottom of the display 58 can also be used to indicate that a 
recognition engine or module is actually active or in or not in a reception mode. The icon 66 
open mouth can indicate that the user can talk to perform a user input or a user selection, 
icons 70, closed lips sealed with a fingertip can indicate that the voice input option is actually 
available. 



vcice 



an 
the 



not 



The icons 66 and 70 and 64 and 68 can complement each other or exclude each other, as tjiey 
provide redundant information. 

1 0 Additionally to the icons, the following means can be used to denote when the user can speak: 

- Spoken prompts can be played to the user, asking to speak an utterance ("Please choose / 
say a category.") 

- Playing an earcon (auditory icon, e.g. a beep) either alone or at the end of a promf|t to 
indicate that the user can start speaking 

- The user can be allowed to control the speaking moment by clicking a special buttoji to 
activate recognition (so called push-to-talk or "PTT" button) 

In order to indicate what the user can say, the following means can additionally be used: 

- Command lists are spoken to the user in the prompt ("Say 'Next*, 'Previous', c Ba|ck\ 
'Exit', or 'Help" 9 ) 

- The prompt is designed to give implicit guidance to the user ("Do you want to go to Ijlext 
or Previous?") 

- The prompt gives an example about what can be said ("Select a day and a time 
example 'Monday at three"') 



for 



about 



A spoken prompt is useful especially at the beginning of a session to remind the user i 
speech interaction. However, since the human beings can catch the content of a small mobile 
screen visually faster than it takes to listen to a sentence, prompts easily tend to sound long 
tedious. Although barge-in (user interrupts the system prompt by speaking) is usually 
well-developed speech applications, users may be uncomfortable with speaking before 
system has stopped, since it is considered impolite in human-to-human conversations. A 
serious problem with spoken prompts is that the information in them is usually lost beyond 
recovery if the user is not concentrating- Also, long command lists are not useful, since 
increase the user's memory load and boredom, since nearly every computer generated monologue 



and 
allowed in 
the 
tnore 
fond 
they 
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lasting longer than 7 words or 3 seconds can readily be perceived as boring or annoying. 



to 



To summarize, while prompts are useful in making the situation more dialogue-like, they tend 
be too long and available only for a short time. Auditory icons are short but they are a [so 
temporary signals. Visual cues for speaking that would stay visible on the screen to indicate 
when speech is allowed, when it is not, and what exactly can be said, would be an easy snd 
transparent way to indicate speech-enabling to the user. Indicating when speech is allowed is aLso 
an easy way to make users aware of the barge-in feature and encourage them to interrupt or 
"vocally override" possible prompts. 

Push-to-talk buttons, while allowing the user more control of the interaction, are not 
without problems, either The device has to have a separate button for voice activation, or 
user must be separately taught that a button serves as a push-to-talk button in some contexts 
some mobile contexts, pressing even one button might be cumbersome e.g. while riding 
motorbike on the pillion. 



filly 
the 
In 
oh a 



Figure 4A and 4B are examples of a distributed speech recognition system being capable of 
dynamically indicating speech-enabling status to the user for multimodal browsing. 

Figure 4A is an example of a distributed speech recognition system being capable of dynamic ally 
indicating speech-enabling status to the user for multimodal browsing, wherein said distributed 
speech recognition system is integrated in a single device 77. The term "distributed speech 
recognition" is used to indicate that the multimodal browsing and the speech recognition is 
executed at least in different processing units of said single device 77. 

The mobile device 77 is comprises of a speech recognition system That is capable of executing 
multimodal interactive browsing, and comprises a user interface with input and output m^ans 
such as the display 82 the keys 84 and 84% a microphone 86 and a loudspeaker 88. The 
interface can be used for multimodal browsing comprising audio and key input and audio 
display output. All the elements of the user interface are reconnected to a central processing 
CPU 80 to control the interaction of the user and the device. 



The speech recognition system comprises at least one central processing unit 80, a display l\2, a 
key-based input system 84, 84', a microphone 86, and a data bus 91. Said display is connected to 
said central processing unit to be controlled by said CPU 80. Said key-based input system 84 
is operably connected to said central processing unit 80, to provide a key input feature providing 
key input options that can be displayed on said display 82. 



iser 
and 
unit 



BEST AVAILABLE COPY 



07/04 '03 LUN 16:09 [N° TI/RI 8833] 



16:14 



Von-Patentanwaelte Becker Kurig Straus 



10 



15 



20 



25 



30 



35 



+49 69 746 303 11 



T-B13 S. 021/029 F-568 

PCT/IB0|3/01262 



15 



the 
cast 



The microphone 86 is operably connected to said at least one CPU 80 to provide a auclio 
electronic converter to make voice input accessible to said CPU 80. The data bus 91 is operafbly 
connected to said at least one CPU 80, to handle data and to exchange data required for 
operation of the said at least one CPU 80. The data bus 91 is operably connecting said at 
one CPU 80 to an internal memory 83 to provide a data access to stored data necessary to pro^sKde 
said key input feature and/or said voice input feature. The internal memory 83 cn e.g store tb 
different condition and combinations of conditions of the device in which the voice input feapxre 
is accessible or not 



Said at least one CPU 80 comprises a first central processing unit 81 and a second processing 
unit 81'. Said first processing unit 81 of said at least one CPU 80 is configured to corjtrol 
multimodal interaction via said display 82, said key based input system 84, 84' and 
microphone 86. Said first processing unit 81 is further configured to monitor conditions 
affect said voice input and to control and display an indication of a voice input option of 
voice input feature on said display 82 according to said monitored condition. 



Figure 4B is an example of a distributed speech recognition system being capable of dynamically 
indicating speech-enabling status to the user for multimodal browsing that is distributed between 
at least two devices. A distributed voice recognition can comprise the advantages that the 
resources required for speech recognition can be economized in the small and e.g. portable 
device 78. 



said 
that 
said 



The 
one 



To provide a distributed system, the CPU 80 has to be distributed between the two devices, 
first central processing unit 81 and the second central processing unit SV of the at least 
CPU 80 are comprised in different interconnected devices 78 and 79. The intercom 
between 97 the two devices (and of cause the first central processing unit 81 and the se 
central processing unit SV) can be provided by, e.g., a telephone connection. The intercor 
can also be provided by a data connection such as GPRS (General Packet Radio Service) 
Internet, LAN (Local Area Network) and the like . 



ond 



Said first central processing unit 81 alone can be configured to monitor said conditio] is 
continually determine tbe availability of said voice input feature. The monitoring can be a] 
e.g. every second in shorter intervals or continuously, in dependence of the kind of parameters 
the conditions that are monitored or surveyed. 



to 

Lpjplied 
or 



The major advantage of the invention is that it can be applied to any kind of mobile elecironic 
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devices regardless of toe used features. A user using an electronic device always under the b<sst 
voice control or multimodal browsing conditions will not recognize the presence of the present 
invention. The present invention can be applied to any kind of voice control or voice input used 
in technical applications. There is also a possibility to apply the present inventioD to a non 
5 mobile system with no limitations in regard of resources, In a non mobile system the presmt 
invention can be used to indicate the words that can be recognized with a probability of neajrly 
100% and words that can be recognized only with a lower recognition rate and therefore are 
to be regarded as being available (or requiring more training). 



25 



not 



1 0 The visual keyword or cue that is chosen to mark the speech-enabling could be a color scherm or 
some other method, such as underlining. Underlining might easily be confused with a hyperbole, 
however. Color would be a good choice, and color displays are becoming more and more 
general. Red is typically used to mark active recording in audio applications, so it might be a 
suitable choice to indicate that speech-enabling is on. Some traffic light scenario could also be 

1 5 adopted. Animated icons may help to visualize that a longer action e.g. a voice input is pos* ble 
for a depicted element such as ant colons, an animated sound spectrum monitor a talking mom h. 



The color system must be learnt as well, even if only two colors are used, one for speech-on 
the other for speech-off indications. A small legend describing the color usage might be visjible 
20 on the early screens of the application. 



Instead of colors, the speech-enabled commands could be marked in some other way, 
drawing a small speech bubble around the command. The visual cue should be directly tied tc 
command, however, to make the enabling method as transparent to the user as possible. 



and 



the 



Changing the visual cue dynamically while on the same page can be done with suitable event 
mechanism. In the same way as the browser can highlight visual symbols in an XfTML 
application when a suitable 'onclick' or 'onfocus' event is caught, new events can be define! for 
cases that call for change in the visual speech-enabling cue. When a multimodal mobile browser 
30 catches these events, it would then change the color or other chosen visual cue in corresponding 
GUI elements as required. 



used 



With speech-enabled tasks that have no visual equivalent, some traffic light scheme can be 
to indicate when speech recognition is active or inactive. This is relatively easy to implement 
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lay 



with events that affect the whole screen at a time. One such measure can be ro wobble die disp 
illumination, invert the depiction mode, or selectively animating the voice enabled menu points, 
or let small balls jump from syllable to syllable as known from 'Karaoke' videos. 



on, 



5 Additional features that can be combined with the present invention are e.g. input predicti 
training dialogues, voice input proposals via text or speech output. Icon based menu structures 
for illiterate people, trainable speech input. Read out user manuals employing a „read out" and a 
„read in" key. 



10 This application contains the description of implementations and embodiments of the present 
invention with the help of examples. It will be appreciated by a person skilled in the art that " 
present invention is not restricted to details of the embodiments presented above, and that 
invention can also be implemented in another form without deviating from the characteristics 
the invention. The embodiments presented above should be considered illustrative, but 

1 5 restricting Thus the possibilities of implementing and using the invention are only resirictec 

the enclosed claims. Consequently various options of implementing the invention as determined 
by the claims, including equivalent implementations, also belong to the scope of the invention. 
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Claims 

1. Method for indicating speech-enabled input for multimodal interaction in an electronic 
5 device having a user interface, comprising: 

activating a multimodal user interaction feature of said user interface in which at least 
key input option and at least one voice input option is provided, 

displaying the at least one key input option on a display of said electronic devjce, 
characterized by 

1 0 checking, if at least one condition generally affecting voice input is fulfilled, and 

providing said at least one voice input option and displaying indications of said voice irput 
options on said display according to said condition. 

2. Method according to claim 1, wherein said displayed indications of voice input opti|ons 
1 5 comprise keywords. 

3. Method according to claim I or 2, wherein said displaying of indications of said voice 
input options on said display further comprises displaying if a speech recognition is actually 
possible. 

20 

4. Method according to any of the preceding claims, wherein said displaying of indicat: ons 
of voice input options comprises displaying said voice input options. 

5. Method according to any of the preceding claims, wherein said displaying of indicat ons 
25 of said voice input options on said display, is provided with a hysteresis. 

6. Method according to any of the preceding claims, wherein said displaying of indicat ons 
of said voice input options on said display is provided with a backlog function. 

30 7, Software tool comprising program code means stored on a computer readable mediun i for 

carrying out the method of anyone of claims 1 to 6, when said software tool is run on a 
computer or network device. 

8. Computer program product comprising program code means stored on a computer 
35 readable medium for carrying out the method of anyone of claims 1 to 6, when said pro-am 

product is run on a computer or network device. 



BEST AVAILABLE COPV 07/04 '03 lun 16:09 [n° tx/rx 8833 i 



Yuii-rdiBmanwaeiiB escnar Ming Mraus 



+48 89 746 303 11 



10 



15 



20 



25 



30 



35 



19 



T-B13 S.0Z5/0Z8 F-568 

PCT/IB(j)3/01262 



:or 



9. Computer program product comprising program code, downloadable from a server 
carrying out the method of anyone of claims 1 to 6, when said program product is run oj a 
computer or network device. 

10. An electronic device capable of executing multimodal interactive browsing, comprising ; 
a central processing unit CPU (80), 

a display (82) connected to said CPU (80), to display visual content received from sjaid 
CPU (80) on said display (82), 
a key-based input system (84, 84') operably connected, to said CPU (80), to provide a key 
input feature providing key input options displayed on said display , 
a microphone (86) operably connected to said CPU (80), to provide a voice input feature, 
a data bus (90), operably connected to said CPU (80), to handle data and to exchange 
required for the operation of the CPU (80), 
wherein said CPU (80) is configured to control multimodal interaction via said display (82), 
said key based input system (84, 84') and said microphone (86), and 

wherein said CPU (80) is configured to monitor conditions that affect said voice input, aijd to 
provide said voice input feature and display an indication of a voice input option of 
voice input feature on said display (82) according to said condition. 



imd 
data 



said 



11. Electronic device according to claim 10, further comprising a mobile 
device. 



comrauniciLUon 



user 



;i key 



12. A speech recognition system capable of multimodal interaction and having a 
interface, comprising: 

at least one central processing unit CPU (80), 
a display (82) connected to said CPU (80), 

a key-based input system (84, 84') operably connected to said CPU (80), to provide 
input feature providing key input options displayed on said display, 
a microphone (86) operably connected to said at least one central processing unit (80), 
a data bus (91), operably connected to said at least one CPU (80), to handle data a^id to 
exchange data required for the operation of the said at least one CPU (80) 
wherein a first central processing unit (81) of said at least one CPU (80) is configured to 
control multimodal interaction via said display (82), said key based input system (84, 84') 
and said microphone (86) and to monitor conditions that affect said voice input and to control 
and display an indication of a voice input option of said voice input feature ojt said 
display (82) according to said condition, and 
wherein a second central processing unit (81') of said at least one CPU (80) is configired to 
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provide said voice input feature. 

13. A system according to claim 12, wherein the first central processing unit (81) and 
second central processing unit (8T) are comprised in the same device (77). 



[he 



the 



14. A system according to claim 12, wherein the first central processing unit (81) and 
second central processing unit (81') are comprised in different interconnec|ted 
devices (78, 79). 
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Abstract of the disclosure 

The present invention provides a method, a device and a system for multimodal interactions. 1 he 
5 method according to the invention comprises the steps of activating a multimodal user 
interaction, providing at least one key input option and at least one voice input option, display ng 
the at least one key input option, checking if there is at least one condition affecting said voice 
input option, and providing voice input options and displaying indications of said provided voice 
input options according to said condition- The method is characterized by checking if at least one 
10 condition affecting the voice input is fulfilled and providing said at least one voice input optjion 
and displaying indications of said voice input options on said display, according to " J 
condition. 
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